Skip to content

Conversation

@Yicong-Huang
Copy link
Contributor

What changes were proposed in this pull request?

This PR consolidates ArrowStreamAggArrowIterUDFSerializer with ArrowStreamAggArrowUDFSerializer.

Why are the changes needed?

When the iterator API was added for Arrow grouped aggregation UDFs, a new ArrowStreamAggArrowIterUDFSerializer class was created. However, this class is nearly identical to ArrowStreamAggArrowUDFSerializer, differing only in whether batches are processed lazily (iterator mode) or all at once (regular mode). By consolidating these two classes, we reduce code duplication and maintain consistency with similar serializer consolidations.

Does this PR introduce any user-facing change?

No, this is an internal refactoring that maintains backward compatibility. The API behavior remains the same from the user's perspective.

How was this patch tested?

Existing Tests

Was this patch authored or co-authored using generative AI tooling?

No

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-54438] Consolidate ArrowStreamAggArrowIterUDFSerializer into ArrowStreamAggArrowUDFSerializer [SPARK-54438][PYTHON] Consolidate ArrowStreamAggArrowIterUDFSerializer into ArrowStreamAggArrowUDFSerializer Dec 5, 2025
@Yicong-Huang Yicong-Huang force-pushed the SPARK-54438/refactor/consolidate-serde-for-sql-grouped-agg-arrow branch from 03a410d to 19d35ba Compare December 5, 2025 07:30
@Yicong-Huang Yicong-Huang force-pushed the SPARK-54438/refactor/consolidate-serde-for-sql-grouped-agg-arrow branch from 19d35ba to 7b9526f Compare December 5, 2025 22:26
@Yicong-Huang
Copy link
Contributor Author

@zhengruifeng could you please review this, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant